A Bounded Index for Cluster Validity

نویسندگان

  • Sandro Saitta
  • Benny Raphael
  • Ian F. C. Smith
چکیده

Clustering is one of the most well known types of unsupervised learning. Evaluating the quality of results and determining the number of clusters in data is an important issue. Most current validity indices only cover a subset of important aspects of clusters. Moreover, these indices are relevant only for data sets containing at least two clusters. In this paper, a new bounded index for cluster validity, called the score function (SF), is introduced. The score function is based on standard cluster properties. Several artificial and real-life data sets are used to evaluate the performance of the score function. The score function is tested against four existing validity indices. The index proposed in this paper is found to be always as good or better than these indices in the case of hyperspheroidal clusters. It is shown to work well on multidimensional data sets and is able to accommodate unique and sub-cluster cases.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A comprehensive validity index for clustering

Cluster validity indices are used for both estimating the quality of a clustering algorithm and for determining the correct number of clusters in data. Even though several indices exist in the literature, most of them are only relevant for data sets that contain at least two clusters. This paper introduces a new bounded index for cluster validity called the score function (SF), a double exponen...

متن کامل

Sum-of-Squares Based Cluster Validity Index and Significance Analysis

Different clustering algorithms achieve different results to certain data sets because most clustering algorithms are sensitive to the input parameters and the structure of data sets. Cluster validity, as the way of evaluating the result of the clustering algorithms, is one of the problems in cluster analysis. In this paper, we build up a framework for cluster validity process, meanwhile a sum-...

متن کامل

A cluster validity index for fuzzy clustering

Cluster validity indexes have been used to evaluate the fitness of partitions produced by clustering algorithms. This paper presents a new validity index for fuzzy clustering called a partition coefficient and exponential separation (PCAES) index. It uses the factors from a normalized partition coefficient and an exponential separation measure for each cluster and then pools these two factors t...

متن کامل

Development of An External Cluster Validity Index using Probabilistic Approach and Min-max Distance

Validating a given clustering result is a very challenging task in real world. So for this purpose, several cluster validity indices have been developed in the literature. Cluster validity indices are divided into two main categories: external and internal. External cluster validity indices rely on some supervised information available and internal validity indices utilize the intrinsic structu...

متن کامل

A Novel Validity Index for Determination of the Optimal Number of Clusters

The structural characteristics of clusters are investigated in the partitioning process. Two partition functions, which show opposite properties around the optimal cluster number, are found and a new cluster validity index is presented based on the combination of these functions. Some properties of the index function are discussed and numerical examples are presented. key words: clustering, val...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007